Distributed Speculations: Providing Fault-tolerance and Improving Performance
نویسنده
چکیده
This thesis introduces a new programming model based on speculative execution and it examines the use of speculations, a form of distributed transactions, for improving the performance, reliability and fault tolerance of distributed systems. A speculation is defined as a computation that is based on an assumption that is not validated before the computation is started. If the assumption is later invalidated the computation is aborted and the state of the program is rolled back; if the assumption is validated, the results of the computation are committed. The primary difference between a speculation and a transaction is that a speculation is not isolated—for example, a speculative computation may send and receive messages, and it may modify shared objects. As a result, processes that share those objects may be absorbed into a speculation. The contributions presented in this thesis include: • the introduction of a new programming model based on speculations, • the definition of new speculative programming language constructs, • the formal specification of the semantics of various speculative execution models, including message passing and shared objects, • the implementation of speculations in the Linux kernel in a transparent manner, and • the design and implementation of components of a distributed filesystem that supports speculations and guarantees sequential consistency of concurrent accesses to files.
منابع مشابه
Speculations: Providing Fault-tolerance and Recoverability in Distributed Environments
Building safe and reliable programs is an important but difficult endeavor. The challenge is even greater in the context of distributed environments, which may involve complex synchronization operations in the presence of process and network failures. Transactions are one of the earliest and simplest abstractions for reliable concurrent programming [2]. They provide fault-isolation by guarantee...
متن کاملImproving the palbimm scheduling algorithm for fault tolerance in cloud computing
Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...
متن کاملA Theory of Nested Speculative Execution
Implementing distributed applications is a challenging task. Developers of such systems are confronted with issues like fault-tolerance, efficient synchronization mechanisms, and the correctness of the distributed code. This paper introduces a new programming model based on speculative execution that addresses these issues. Speculations provide distributed atomic rollback and enable optimistic ...
متن کاملThe performance of independent checkpointing in distributed systems
This paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed system, Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a...
متن کاملImproving Performance in Adaptive Fault Tolerance Structure with investigating the effect of the number of replication
Regarding the wide use of distributed systems in various areas, having a system with fault tolerance ability would be an import characteristic. And in designing the real time distributed systems, this seems to be more considerable. With regard using some middleware like CORBA in designing such systems, and in order to increase their compatibility, speed, performance, to simplify the network pro...
متن کامل